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Abstract 

In this paper, we propose a so-called probabilistic non-local means (PNLM) method for image 
denoising. Our main contributions are: 1) we point out defects of the weight function used in the classic 
^ NLM; 2) we successfully derive all theoretical statistics of patch-wise differences for Gaussian noise; 

and 3) we employ this prior information and formulate the probabilistic weights truly reflecting the 
similarity between two noisy patches. The probabilistic nature of the new weight function also provides 
a theoretical basis to choose thresholds rejecting dissimilar patches for fast computations. Our simulation 
'f. ' results indicate the PNLM outperforms the classic NLM and many NLM recent variants in terms of peak 

signal noise ratio (PSNR) and structural similarity (SSIM) index. Encouraging improvements are also 
z/i found when we replace the NLM weights with the probabilistic weights in tested NLM variants. 
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Non-local means (NLM) is a popular data-adaptive image denoising technique introduced by Buades 

^ et al. |[T|, Q. This technique is proven to be effective in many image denoising tasks. In the classic 

^ NLM, a 2D clean image x = {a;i}igi defined on the spatial domain I is assumed to be contaminated by 

c5 i.i.d. zero-mean Gaussian noise with an unknown variance a^, i.e. 

yi = xi + ni, and ni J\f{0,a'^). (1) 

where yi, xi and ni denote the noisy observation, the clean image pixel and the pixel noise, respectively. 
The NLM then estimates the clean pixel xi by using a weighted sum of the noisy pixels within a prescribed 
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search region S, typically a square or a rectangular region: 

= Y.k&i wi,kyk/Wi (2) 

where each weight is computed by quantifying the similarity between two local patches (defined as P) 
around noisy pixels yi and as shown in Eq. Q, 

wi,k = exp ( - Y^jevivi+j " Vk+jf/h) (3) 
and the summation of all weights is denoted as 

Wi = Ekes, ^i,k- (4) 

Although the original NLM weight |[T|, |[2| includes a weak Gaussian smoother, the weight ([3]) is a 
simplified version with similar performance that is also widely accepted in the NLM community 

Within the NLM framework, much progress has been made in recent years. Some authors have focused 
on fast NLM implementation Q, Q, while others have explored NLM parameter optimization Q, or 
have adjusted the NLM framework to achieve better performance Q. We notice that one shared interest 
of these three topics is the weight function of the NLM, which is the core of the NLM algorithm. 
Calculation of NLM weights is the most computationally expensive part of the algorithm and is related 
to many parameter optimization schemes. It has long been noticed that the NLM weight function is 
somewhat inadequate [7] because it tends to give non-zero weights to dissimilar patches. However, the 
reason behind this inadequacy has not fully explored. 

In this letter, we focus on the NLM weight function and propose a new weight under a probabilistic 
framework. The rest of the paper is organized as follows: Sec. II shows the defects of the NLM weights; 
Sec. Ill proposes our PNLM framework with new probabilistic weights; Sec. IV shows simulation results; 
and we conclude the letter in Sec. V. 

II. Problems with the NLM Weight Function 

The NLM weight function Q is considered as w/^fc = exp(— Z); where h' is a translation of the 

temperature parameter /i in ([3]) and 

Di,k = Ejevivi+j - yk+jf/'i(y'^ (5) 

is the patch difference between the patches around yi and y^. In this way, ([5]) can be interpreted as the 
standard quantitative test to measure the similarity of the two samples jsj. The statistical interpretation 
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of the exponential function used in Q is not straightforward |[8|, although may be possible to relate it to 
Gaussian kernels used in probability density estimation. Nevertheless, this exponential function gives a 
larger weight to a pixel with a smaller patch difference (Fig. 1(a)). Intuitively, this idea is quite reasonable, 
as it means that the NLM relies more on pixels with smaller patch differences. However, we demonstrate 
below that this exponential function makes the NLM weights somewhat problematic. 
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Fig. 1: NLM weight function and the distribution of 7x7 patch differences, (a) the NLM weig ht for h ^ cr^ |p|. 
(b) the distribution of disjoint patch differences. Red and green circles denotes two equal probable patch differences, 
while are biasedly weighted in NLM. 

From now on, we consider Z); ^ as a random variable (r.v.) and assume patches around xi and Xk 
match perfectly, i.e. 

J2jGFixi+j-Xk+jf = 0. (6) 

If they are disjoint, then -D^ fc~x^p|' where |-| denotes the cardinality function. Fig. 1 shows this distribution 
on the right, with the corresponding NLM weight function on the left. It is clear that the NLM weight 
function gives two equally probable Di^i^s very different weights and that it fails to give the largest weight 
to the most probable case. For Di ^s close to its expected value, the weight errors are not too large because 
the corresponding region in the exponential curve is almost linear with a moderate slope. However, for 
Di j^s far away from its expected value, weight errors are very large because the NLM function tends 
to give nonzero weights to these highly improbable cases, so weight errors grow quickly. In practice, 
correcting over-weighted weights has been shown to improve NLM performance. For example, the center 
pixel weight (CPW) in NLM is unitary and thus over-weights center pixels. |[9|, |10| report noticeable 
improvement just by tuning these over-weighted CPWs. 



February 26, 2013 



DRAFT 



4 



III. Probabilistic Non-Local Means 

Instead of including the exponential function in weighting pixels, we propose the following probabilistic 
weight 

Wl,k = fl,k{DLk/P^) (J) 

where fi^k{-) is the theoretical probability density function (p.d.f.) of the r.v. Di^k, is the patch 
difference using estimated variance a"^ in ([5]), and p is a tuning parameter. This weight function then can 
be interpreted as the probability of seeing a noisy patch difference when we know the two clean patches 
match perfectly. 

A. Theoretical Distribution of Patch-wise Distance 

Pretend we know the true noise variance (later we will show this knowledge is unnecessary). Our 
goal is to derive the theoretical p.d.f. of the patch difference when the two clean patches around pixel 
xi and Xk are perfectly matching (see Q). To do so, we denote the pixel distance di^u as 

di,k = {yi-yk?/2a^ (8) 

and thus we have Di ^ of the form that 

Dl,k = Y.j&p'il+j,k+j- (9) 



Because two patches are perfectly matching and noise is i.i.d., for all j G P we have 

- n- i 
2(t2 



ai+j,k+j = ^-n Xi- 



If all di^j^k+j^ are i.i.d., then we have -Di.A;~X^p|' whose mean is |P| and variance is 2|P|. However, the 
i.i.d. assumption does not hold when the two patches overlap, as is the case for many pairs of patches. 



Fortunately, it is known that such a summed correlated distribution can be well approximated 1 1 1 1 as 
follows, 

Di,k ~ Ikxi (10) 
where parameters 7^ and % can be determined by the first two cumulants of Di ^ |11| as shown below. 

7fe = var[A,fc]/(2E[A,fe]) (11) 
?7fc = E[A,fe]/7fc (12) 
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The cumulant E[Di^k] is straightforward to find, and it is 



E[A,fc]=E,epE[d,+,-fc+,-]=|P|. 
With regards to var[D/ , the following identity always holds 

var[A,A:] = ^ij(,f.cov[di+i^k+i, di+j^k+j] 
where the covariance can be written as follows. 



(13) 



(14) 



coY[di+i^k+i,di+j,k+j] =^[di+i,k+idi+j,k+j] — fJ'di.k 



(15) 



This equation compares two pairs of r.v.s, N| ^={n/+j, Uk+i} and j^={ni^j, rik+j}, of which 0, 1 or 2 
may be repeated. Thus, 



^[di+i,k+idi-\-j^k+j] 



3,ifK,nNf^,|=2 
1.5,if|Ni^,nNi;j = l (16) 

hmik^Kk\ = ^ 

found as E[d;+j fc+jd^+j fc+j] measures kurtosis if both pixels are repeated; pixels are i.i.d. if distinct; and 
by expanding terms if one pixel is repeated. This implies that 



2,ifK,nN;;,| = 2 

o.5,if|Nj^,nN^;,| = i 

0,if|Ni.nN^,, 



(17) 



var[Z'; fc] is thus dependent on the number of terms for which |N; 



i,k '^^/,fci 



-2 and for which iNj/lN^^I 



Case |NJ^nN;^^| = 2 happens only when i = j, and so the number of overlapped pixels is |P|. Case 
|N; /TN; = 1 happens only when two patches overlap, and the corresponding number is the same as the 
number of overlapping pixels. Letting O^ ^ be the set of overlapping pixels, var[D; fc] can be written as 



var[A,fc] = 2|P| + \Oi,k\ 



(18) 



and is known once k is given. As a result, 7^ and % in ( [TT) and ( [12] ) can be determined, implying the 
p.d.f. of Di^k is 



fi,kiD) = xiiDhk): 



exp(-D/27fc) 



2^'=/2r(r/fc/2) 



(19) 
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The different spatial relationships of patch pairs imply different |0;^fc|, thus causing different var[Z); jt], 
7fc, and finally p.d.f. fc. This conclusion means that NLM weights calculated without considering spatial 
correlations are inadequate. 

fi^k also provides a natural criterion for speeding computation by rejecting over-dissimilar patches at 
an early stage (see similar usage in |4|). Thresholds can be set by finding critical values and 
under a prescribed significance level a such that 

Fr{Dl- < D < D*+\fi^k) = a. (20) 

B. Parameters Discussions 

Above we did not use fi^kC^i,k) as our weight function, but instead used fi^k{Di^k/ p"^)- The parameter 
provides a way to adjust our probabilistic model when an estimated variance Jr^ is used instead of 
the true u^. When 

p2 = aV?' (21) 
reflects the ratio of the true noise variance to the estimated one, all previous derivations hold. Thus 

di,k = {yi - ykf/2a^ = p\yi - ytf /2a^ ~ Xp^ (22) 

indicating that £[5;,^] = p^\¥\ and \!a[Di^k] = yO^(2|P| + |Oi,fc|)- Further, this implies that the actual 
parameters used are ^k = P^lk and ffk = rjk. Finally, these results lead to 

IkiD) = xi{D/%) = xliD/ip^^k)) = fiAD/p") 

which is the weight function given in Q. 

The raw probabilistic CPW wi^i = fi^ki^) ~ under- weights a noisy center pixel. A more plausible 
CPW is 

wi,i = x\rim)- (23) 
which is the same as the weight of the most probable case. This CPW is used for the rest of the letter. 

IV. Simulation Results 

All of the following simulations are done under the MATLAB r2012b environment. Our two goals are 
1) to show that the derived p.d.f. /; ^ in ^T9\ closely approximates its true p.d.f.; and 2) to confirm the 
superiority of the proposed probabilistic weights and PNLM. 
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Fig. 2 shows the var[Z); fc] map for a 7x7 search region with 3x3 patches and the six typical theoret- 
ical p.d.f.s // fe, plotted with the corresponding sample distributions estimated from 100,000 realizations. 
It is noticeable that the var[L'i fc] map is location-dependent and isotropic with one of the six theoretical 
values {18,19,20,21,22,24}. The more pixels overlap, the larger var[D; fc] is, implying a smaller peak on 
its p.d.f. It is clear that the predicted p.d.f.s are very close to those estimated from a large number of 
samples. 

Since it is clear that the accuracy of the ^ approximation degrades as correlation increases, the 
approximation accuracy of the most-overlapped cases can be used to characterize the worst-case accuracy. 
For each combination of search region and patch size P, there are four possible ks that attain the 
maximum correlation, all of which are one pixel away from the center pixel (see examples for /c=18, 
24, 26, and 32 on Fig. 2-(a)). In Table I, we report the averaged P-values of goodness of fit tests for 
the most correlated /;,fcS, where each P-value is the averaged from P-values of the four most correlated 



fcS. Because all observed P-values are above 5%, we say the approximated theoretical p.d.f. ( [T9| ) gives 

satisfactory predictions, so these p.d.f.s can reliably be used to quantify patch similarities. 

TABLE I: Averaged P-values of goodness of fit tests for the observed sample distributions. 
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In the following simulation, we compare the three pairs of NLM and PNLM algorithms, namely 1) 
die classic NLM and the proposed PNLM, 2) the classic NLM with the James-Stein Shrinkage (JSNLM) 



1 10 1 and the proposed PNLM with the James-Stein Shrinkage (PSJNLM), and 3)the nonlocal median 
with the classic weights (NLEM) |j6| and the nonlocal median with the probabihstic weights (PNLEM). 
The only difference between the two algorithm in each pair is the weight function. With regards to the 
parameter settings, we use patch size 7 and search region size 21 for all methods. For the temperature 
parameter h in NLMs, we use /i = |P|cj^, which is nearly optimal and suggested in ||3|. For p in PNLMs, 
we use yO = 1. To quantify the quality of a denoised method, we compute the average PSNR ||3] and 



SSIM 1 12 1 scores from 10 realizations for each method and each noise level. These results are reported 
in Table II. 

From Table II, it is clear that 1) the proposed PNLM method outperforms the NLM method and those 
recent variants like NLEM and JSNLM; and 2) by replacing the NLM weight with the new proposed 
probabilistic one, both NLEM and JSNLM are improved in terms of higher PSNR/SSIM scores. Fig. 3 
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Fig. 2: Theoretical and estimated p.d.f. /;,fcS for 7x7 search region and 3x3 patches, (a) theoretical var[£'/jj] map 
(white indices indicates fcs in S;)- (b)-(g) theoretical (red dash lines) and estimated (blue bars) p.d.f. /; ^.s for fc = 8 

(var[i:>/,8]=18), fc = 9 (var[A,9] = 19),fc = 10 (var[A,io] =20),fc= 11 (var[A.ii] = 21), fc=17 (var[A,i'7] = 22) and 
fc = 18 (var[£';_i8] =24), respectively. 

gives example denoising results and method noise images of the NLM and PNLM algorithms. These 
results show that the effectiveness of the new proposed probabilistic weight and the superiority of the 
PNLM framework. 



V. Conclusion 

In this letter, we pointed out the insufficiency of the NLM weights and showed a new promising PNLM 
framework, whose weights better reflect patch similarities. The proposed PNLM framework connects the 
denoising process and the noise type and thus is meaningful for denoising other types of noise. As 
long as a noise p.d.f. is known, we can estimate ^ correspondingly. In this way, a universal denoising 
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TABLE II: Performance comparisons for NLM and PNLM methods 

PSNR(dB)\ o- I 10 20 30 40 50 60 70 80 90 100 

a NLM 32.57 28.92 26.98 24.98 23.52 22.52 21.84 21.24 20.82 20.44 

I PNLM 32.47 29.08 27.44 26.26 25.19 24.13 23.26 22.44 21.84 21.31 

g NLEM 32.66 28.90 26.63 24.78 23.35 22.16 21.78 21.31 20.95 20.55 

I PNLEM 33.06 29.42 27.36 25.72 24.90 23.87 23.11 22.28 21.73 21.13 

JSNLM 32.64 29.01 27.13 25.46 24.12 23.10 22.33 21.61 21.09 20.63 

PSJNLM 32.21 29.08 27.45 26.16 25.07 24.05 23.20 22.40 21.77 21.23 

NLM 34.08 31.30 28.79 26.88 25.62 24.66 23.85 23.31 22.90 22.45 

« PNLM 34.92 32.40 30.48 28.70 27.25 26.14 24.98 24.17 23.57 22.98 

I NLEM 34.30 30.43 27.80 26.53 25.51 24.88 24.13 23.52 22.92 22.43 

•« PNLEM 34.56 31.97 30.24 28.64 27.07 26.11 24.90 24.14 23.38 22.94 

JSNLM 34.62 31.70 29.29 27.30 25.94 24.87 23.98 23.35 22.86 22.37 

PSJNLM 34.81 32.38 30.36 28.58 27.11 25.89 24.82 23.99 23.36 22.75 

NLM 33.74 30.91 28.72 27.14 26.03 25.13 24.42 23.88 23.44 23.03 

PNLM 34.59 32.07 30.17 28.58 27.32 26.23 25.33 24.59 23.98 23.43 

I NLEM 33.56 30.00 28.41 27.30 26.47 25.65 25.05 24.29 23.64 23.12 

•S PNLEM 33.78 31.21 29.64 28.32 27.26 26.28 25.58 24.80 24.20 23.69 

JSNLM 34.38 31.41 29.21 27.49 26.26 25.26 24.47 23.85 23.33 22.86 

PSJNLM 34.72 32.07 30.09 28.48 27.20 26.07 25.15 24.38 23.74 23.15 

NLM 39.04 33.80 30.95 28.94 27.37 25.94 24.45 23.25 21.84 20.87 

PNLM 40.34 35.17 32.31 30.26 28.38 26.71 25.37 24.50 23.29 22.76 

NLEM 39.68 34.13 30.90 28.94 27.07 25.62 24.59 23.54 22.66 22.09 

■« PNLEM 39.72 34.49 31.25 29.18 27.15 25.77 24.68 23.78 22.93 22.52 

JSNLM 39.03 33.79 30.93 28.93 27.35 25.91 24.42 23.22 21.81 20.84 

PSJNLM I 34.64 30.86 31.23 29.73 27.95 26.39 25.08 24.25 23.10 22.60 

SSIM(%)\ o- I 10 20 30 40 50 60 70 80 90 100 

g NLM 91.08 82.92 78.50 73.87 68.97 64.18 59.78 55.58 51.87 48.69 

1 PNLM 91.64 84.65 80.23 76.61 73.26 69.72 66.18 62.72 59.45 56.60 

2 NLEM 88.68 80.24 72.53 64.98 59.15 53.04 48.67 43.96 40.45 35.95 
I PNLEM 91.16 83.22 78.13 73.31 68.89 63.11 58.90 54.51 50.54 46.18 
S JSNLM 91.23 84.32 78.91 73.63 68.74 64.01 59.62 55.34 51.36 48.23 

PSJNLM 89.69 84.04 79.29 74.97 71.01 66.98 63.13 59.30 55.61 52.82 

NLM 87.63 83.77 79.88 75.11 70.63 66.34 62.16 58.30 54.86 51.40 

« PNLM 89.38 85.00 81.72 78.18 74.70 71.05 67.43 63.99 60.81 58.05 

I NLEM 88.06 81.79 75.43 69.11 63.09 57.09 51.09 45.81 41.19 37.04 

■« PNLEM 89.17 84.11 79.92 75.17 70.40 65.40 59.92 54.59 49.58 46.80 

JSNLM 89.12 84.14 79.54 74.52 69.78 65.19 60.80 56.94 53.33 49.86 

PSJNLM 89.34 84.57 80.64 76.45 72.35 68.03 63.80 60.11 56.49 53.34 

NLM 87.86 83.98 79.39 74.90 70.72 66.72 62.87 59.23 55.82 52.59 

Q PNLM 89.69 85.00 81.18 77.56 74.16 70.78 67.57 64.51 61.44 58.81 

I NLEM 88.21 81.19 75.44 69.48 63.71 57.57 52.22 46.92 42.29 38.54 

PNLEM 89.56 84.02 79.03 74.05 69.42 64.64 60.09 55.85 51.87 48.25 

JSNLM 89.37 83.94 79.16 74.48 69.95 65.67 61.61 57.86 54.32 50.99 

PSJNLM 89.75 84.64 80.21 76.00 71.93 67.97 64.16 60.70 57.23 54.11 

NLM 99.01 97.38 95.35 93.23 90.75 87.94 84.22 80.76 76.31 72.00 

5; PNLM 99.33 98.32 97.03 95.68 93.82 91.79 89.48 87.61 84.87 82.95 

"I NLEM 99.10 97.67 95.46 92.65 89.20 85.12 81.90 78.73 74.26 71.71 

■« PNLEM 99.22 97.99 96.03 94.26 91.83 88.85 85.91 83.16 79.38 77.35 

JSNLM 99.00 97.34 95.24 93.11 90.53 87.59 83.79 80.23 75.65 71.18 

PSJNLM 94.25 92.72 95.09 93.74 91.66 89.53 86.86 84.83 82.02 80.14 



framework (see example |13|) for multiple types of known noises and mixed noises may be developed. 
In addition, the proposed PNLM can also be extended to capture non i.i.d. noises, because one can easily 
to replace the p.d.f. of patch difference ^ with more general forms. For example, for Gaussian noises 
with changing variance, fc(Z)|(7^)Pr((T^) can be used in place of fi^k{D). The proposed PNLM also 
provides a theoretical basis to quantify patch similarities, so Eq. (|20]) can be used in other ways other 



than early termination. For example, the critical values predicted by ^ can also be used as thresholds 



to reject or accept a patch in the first stage of BM3D |14|. This choice provides a theoretically-based 
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Fig. 3: NLM and PNLM denoising results for a = 80 (cropped and enlarged from results of image checker), (a) 
clean image; (b) noisy observation; (c) to (h): denoising results and method noise images of NLM, PNLM, NLEM, 
PNLEM, JSNLM, and PJSNLM, respectively. 



alternative to the empirical hard thresholds in BM3D (i.e. Tmatch in Eq. (2) of 1 14 1). In our initial tests, 
we also see a performance improvement after using the probabilistic thresholds. 
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